Weakly Supervised Learning of Heterogeneous Concepts in Videos
نویسندگان
چکیده
Typical textual descriptions that accompany online videos are ‘weak’: i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a generalization of the Indian Buffet Process (IBP) that can (a) systematically incorporate heterogeneous concepts in an integrated framework, and (b) enforce location constraints, for efficient classification and localization of the concepts in the videos. Finally, we develop posterior inference for the proposed formulation using mean-field variational approximation. Comparative evaluations on the Casablanca and the A2D datasets show that the proposed approach significantly outperforms other state-of-the-art techniques: 24% relative improvement for pairwise concept classification in the Casablanca dataset and 9% relative improvement for localization in the A2D dataset as compared to the most competitive baseline.
منابع مشابه
Finding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos
Grounding textual phrases in visual content with standalone image-sentence pairs is a challenging task. When we consider grounding in instructional videos, this problem becomes profoundly more complex: the latent temporal structure of instructional videos breaks independence assumptions and necessitates contextual understanding for resolving ambiguous visual-linguistic cues. Furthermore, dense ...
متن کاملSemi Supervised Learning in Wild Faces and Videos
We propose an approach for improving unconstrained face recognition based on leveraging weakly labeled web videos. It is easy to obtain videos that are likely to contain a face of interest from sites such as YouTube through issuing queries with a person’s name; however, many examples of faces not belonging to the person of interest will be present. We propose a new technique capable of learning...
متن کاملWeakly supervised activity analysis with spatio-temporal localisation
In computer vision, an increasing number of weakly annotated videos have become available, due to the fact it is often difficult and time consuming to annotate all the details in the videos collected. Learning methods that analyse human activities in weakly annotated video data have gained great interest in recent years. They are categorised as “weakly supervised learning”, and usually form a m...
متن کاملRegularized Multi-Concept MIL for weakly-supervised facial behavior categorization
In this work, we address the problem of estimating high-level semantic labels for videos of recorded people by means of analysing their facial expressions. This problem, to which we refer as facial behavior categorization, is a weakly-supervised learning problem where we do not have access to frame-by-frame facial gesture annotations but only weak-labels at the video level are available. Theref...
متن کاملWeakly supervised learning from images and videos∗
With the amount of on-line available digital content growing daily, large-scale, weakly supervised learning is becoming more and more important. In this talk we present some recent results for weakly supervised learning from images and videos. Standard approaches to object category localization require bounding box annotations of object instances. This time-consuming annotation process is sides...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016